A Time and Space Efficient Algorithm for Contextual Linear Bandits
نویسندگان
چکیده
We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(log T ) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |X |. We propose an -greedy type of algorithm that solves both limitations. In particular, when contexts are variables in R, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret ofO(poly(d) log T ) even when |X | = Ω(2). In addition, unlike previous algorithms, its space complexity scales like O(Kd) and does not grow with T .
منابع مشابه
CBRAP: Contextual Bandits with RAndom Projection
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual d...
متن کاملFinite-Time Analysis of Kernelised Contextual Bandits
We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions’ contexts and that the expected reward is an arbitrary linear function of the contexts’ images in the related repro...
متن کاملContextual Dueling Bandits
We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner’s goal is to find the best policy, or way of behaving, in some space of policies, although “be...
متن کاملBISTRO: An Efficient Relaxation-Based Method for Contextual Bandits
We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself...
متن کاملA Survey on Contextual Multi-armed Bandits
4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...
متن کامل